[https://nvbugs/5955188][fix] Fix harmony parsers for agentic coding use cases#12045
[https://nvbugs/5955188][fix] Fix harmony parsers for agentic coding use cases#12045dongfengy wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughThis change adds comprehensive tool call handling to the harmony adapter's token streaming logic, including detection, validation, and argument accumulation across analysis and commentary channels. It enhances harmony-to-OpenAI format conversion with reasoning_content support, improves tool call recipient parsing with JSON validation, and refines end-of-stream processing. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/serve/harmony_adapter.py`:
- Around line 1613-1615: The final-chunk path skips emitting the initial
assistant-role SSE chunk when output.token_ids_diff is empty; update the logic
around harmony_adapter.create_openai_streaming_response so it always emits the
assistant-role/event chunk prior to sending finish/usage chunks even if
token_ids_diff is empty—i.e., factor out or call the same assistant-chunk
emission used in the non-done path before checking output.token_ids_diff,
ensuring the code paths at the blocks around output.token_ids_diff (the call to
harmony_adapter.create_openai_streaming_response and the subsequent finish/usage
sending blocks) both produce the initial assistant-role SSE event.
- Around line 1615-1620: The code calls
harmony_adapter.create_openai_streaming_response(...) and then unconditionally
appends a terminal final_response when done=True, which can duplicate a stop
chunk if create_openai_streaming_response already produced one; update the logic
around remaining_responses and final_response in the done=True branches (the
occurrences that use remaining_responses and final_response) to short-circuit:
check whether remaining_responses already contains a terminal/stop chunk (the
same marker your streaming responses use) and if so do not append another
final_response, and apply the same fix to the duplicate block later (the similar
code around lines 1632-1649) to avoid emitting duplicate terminal stop chunks.
- Around line 214-219: The current filtering in should_filter_tools + the other
site (and harmony_output_to_openai) wrongly blocks built-in tools like
"browser", "python", and "container" when external tools are present; implement
a single helper (e.g., is_tool_allowed(request_id, func_name, tool_choice,
available_tools)) and use it in both filter locations so that: if tool_choice ==
"none" it returns false for everything; if tool_choice != "none" it always
returns true for built-ins ("browser", "python", "container"); and for
non-built-in functions it checks membership against available_tools (and
respects should_filter_tools). Update calls in the places referencing
should_filter_tools / available_tools and harmony_output_to_openai to call this
helper instead.
- Around line 205-223: The current branch opens streamed tool calls for
parser.current_channel == "analysis" but the transition logic in
process_token_batch() and process_token_batch_to_messages() only deactivates
previous tool calls when prev_channel == "commentary", causing reuse/merging of
tool_id across analysis-invocations; update both transition handlers so they
retire (set active=False) prior tool calls when prev_channel is either
"commentary" or "analysis" (i.e., check prev_channel in
("commentary","analysis")) before creating/getting a new tool call via
_get_or_create_tool_call, ensuring parser.current_channel/
current_channel_state/_get_or_create_tool_call don't reuse an old tool_id from
the other channel.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b94c5807-7826-4021-a5ef-b2eedda0d888
📒 Files selected for processing (1)
tensorrt_llm/serve/harmony_adapter.py
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
efffc01 to
81f3318
Compare
|
/bot run |
|
PR_Github #38304 [ run ] triggered by Bot. Commit: |
|
PR_Github #38304 [ run ] completed with state
|
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
| return {"should_stop": "Repeated message"} | ||
|
|
||
| # Check for tool calls first, regardless of channel. | ||
| # The model may emit tool calls on either "commentary" or "analysis" channel. |
There was a problem hiding this comment.
This behavior is kind of violating the harmony definition. https://github.com/openai/harmony/blob/main/docs/format.md#receiving-tool-calls
But if the model is behaving like this, we should WAR to unblock our work.
|
/bot run --disable-fail-fast |
|
PR_Github #38380 [ run ] triggered by Bot. Commit: |
|
PR_Github #38380 [ run ] completed with state
|
Summary by CodeRabbit
New Features
Improvements
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.